[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 #149461

badumbatish · 2025-07-18T06:18:30Z

Previously, even with simd enabled via -mattr=+simd128, the compiler cannot utilize v128 to optimize loads and setcc of i128, instead legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16 bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is either a load or an integer in i128, with the comparison code being either EQ | NE, without NoImplicitFloat function flag.

Inspiration taken from RISCV's isel lowering.

badumbatish · 2025-07-18T06:24:43Z

Edit: this is resolved.

I'm trying out this PR but I think I encountered a blocker. The issue pops up with this reduced test case from the test case stest_f64i64 in WebAssembly/fpclamptosat.ll, produced from llvm-reduce.

I'm not sure how to reconcile this?

...

badumbatish · 2025-07-31T21:26:47Z

alright, with Luke's pointer from this PR #114517, I've tried a different approach: doesn't allow i128 to be legal everywhere but only on load via enableMemCmpExpansion and instead of modifying load i128 directly, I hook to setcc instead.

llvmbot · 2025-07-31T23:27:34Z

@llvm/pr-subscribers-backend-webassembly

Author: Jasmine Tang (badumbatish)

Changes

Fixes #149230

Full diff: https://github.com/llvm/llvm-project/pull/149461.diff

3 Files Affected:

(modified) llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp (+58-2)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp (+2-1)
(modified) llvm/test/CodeGen/WebAssembly/memcmp-expand.ll (+8-14)

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index cd434f7a331e4..ee16f7bf9133d 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -3383,8 +3383,61 @@ static SDValue TryMatchTrue(SDNode *N, EVT VecVT, SelectionDAG &DAG) {
   return DAG.getZExtOrTrunc(Ret, DL, N->getValueType(0));
 }
 
+static SDValue
+combineVectorSizedSetCCEquality(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
+                                const WebAssemblySubtarget *Subtarget) {
+
+  SDLoc DL(N);
+  SDValue X = N->getOperand(0);
+  SDValue Y = N->getOperand(1);
+  EVT VT = N->getValueType(0);
+  EVT OpVT = X.getValueType();
+
+  ISD::CondCode CC = cast<CondCodeSDNode>(N->getOperand(2))->get();
+  SelectionDAG &DAG = DCI.DAG;
+  // We're looking for an oversized integer equality comparison.
+  if (!OpVT.isScalarInteger() || !OpVT.isByteSized() || OpVT != MVT::i128 ||
+      !Subtarget->hasSIMD128())
+    return SDValue();
+
+  // Don't perform this combine if constructing the vector will be expensive.
+  auto IsVectorBitCastCheap = [](SDValue X) {
+    X = peekThroughBitcasts(X);
+    return isa<ConstantSDNode>(X) || X.getOpcode() == ISD::LOAD;
+  };
+
+  if (!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y))
+    return SDValue();
+
+  // TODO: Not sure what's the purpose of this? I'm keeping here since RISCV has
+  // it
+  if (DCI.DAG.getMachineFunction().getFunction().hasFnAttribute(
+          Attribute::NoImplicitFloat))
+    return SDValue();
+
+  unsigned OpSize = OpVT.getSizeInBits();
+  unsigned VecSize = OpSize / 8;
+
+  EVT VecVT = EVT::getVectorVT(*DCI.DAG.getContext(), MVT::i8, VecSize);
+  EVT CmpVT = EVT::getVectorVT(*DCI.DAG.getContext(), MVT::i8, VecSize);
+
+  SDValue VecX = DAG.getBitcast(VecVT, X);
+  SDValue VecY = DAG.getBitcast(VecVT, Y);
+
+  SDValue Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, CC);
+
+  SDValue AllTrue = DAG.getZExtOrTrunc(
+      DAG.getNode(
+          ISD::INTRINSIC_WO_CHAIN, DL, MVT::i32,
+          {DAG.getConstant(Intrinsic::wasm_alltrue, DL, MVT::i32), Cmp}),
+      DL, MVT::i1);
+
+  return DAG.getSetCC(DL, VT, AllTrue, DAG.getConstant(0, DL, MVT::i1), CC);
+}
+
 static SDValue performSETCCCombine(SDNode *N,
-                                   TargetLowering::DAGCombinerInfo &DCI) {
+                                   TargetLowering::DAGCombinerInfo &DCI,
+                                   const WebAssemblySubtarget *Subtarget) {
   if (!DCI.isBeforeLegalize())
     return SDValue();
 
@@ -3392,6 +3445,9 @@ static SDValue performSETCCCombine(SDNode *N,
   if (!VT.isScalarInteger())
     return SDValue();
 
+  if (SDValue V = combineVectorSizedSetCCEquality(N, DCI, Subtarget))
+    return V;
+
   SDValue LHS = N->getOperand(0);
   if (LHS->getOpcode() != ISD::BITCAST)
     return SDValue();
@@ -3532,7 +3588,7 @@ WebAssemblyTargetLowering::PerformDAGCombine(SDNode *N,
   case ISD::BITCAST:
     return performBitcastCombine(N, DCI);
   case ISD::SETCC:
-    return performSETCCCombine(N, DCI);
+    return performSETCCCombine(N, DCI, Subtarget);
   case ISD::VECTOR_SHUFFLE:
     return performVECTOR_SHUFFLECombine(N, DCI);
   case ISD::SIGN_EXTEND:
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
index 52e706514226b..08fb7586d215e 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
@@ -147,7 +147,8 @@ WebAssemblyTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
 
   Options.AllowOverlappingLoads = true;
 
-  // TODO: Teach WebAssembly backend about load v128.
+  if (ST->hasSIMD128())
+    Options.LoadSizes.push_back(16);
 
   Options.LoadSizes.append({8, 4, 2, 1});
   Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
diff --git a/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll b/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
index 8030438645f82..c6df6b50693fa 100644
--- a/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
+++ b/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s  -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck %s
+; RUN: llc < %s  -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s
 
 target triple = "wasm32-unknown-unknown"
 
@@ -132,19 +132,13 @@ define i1 @memcmp_expand_16(ptr %a, ptr %b) {
 ; CHECK-LABEL: memcmp_expand_16:
 ; CHECK:         .functype memcmp_expand_16 (i32, i32) -> (i32)
 ; CHECK-NEXT:  # %bb.0:
-; CHECK-NEXT:    i64.load $push7=, 0($0):p2align=0
-; CHECK-NEXT:    i64.load $push6=, 0($1):p2align=0
-; CHECK-NEXT:    i64.xor $push8=, $pop7, $pop6
-; CHECK-NEXT:    i32.const $push0=, 8
-; CHECK-NEXT:    i32.add $push3=, $0, $pop0
-; CHECK-NEXT:    i64.load $push4=, 0($pop3):p2align=0
-; CHECK-NEXT:    i32.const $push11=, 8
-; CHECK-NEXT:    i32.add $push1=, $1, $pop11
-; CHECK-NEXT:    i64.load $push2=, 0($pop1):p2align=0
-; CHECK-NEXT:    i64.xor $push5=, $pop4, $pop2
-; CHECK-NEXT:    i64.or $push9=, $pop8, $pop5
-; CHECK-NEXT:    i64.eqz $push10=, $pop9
-; CHECK-NEXT:    return $pop10
+; CHECK-NEXT:    v128.load $push1=, 0($0):p2align=0
+; CHECK-NEXT:    v128.load $push0=, 0($1):p2align=0
+; CHECK-NEXT:    i8x16.eq $push2=, $pop1, $pop0
+; CHECK-NEXT:    i8x16.all_true $push3=, $pop2
+; CHECK-NEXT:    i32.const $push4=, 1
+; CHECK-NEXT:    i32.xor $push5=, $pop3, $pop4
+; CHECK-NEXT:    return $pop5
   %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
   %res = icmp eq i32 %cmp_16, 0
   ret i1 %res

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

lukel97

The PR title should probably be something like "Expand memcmp for 16 byte loads with simd128", since this PR also enables it in WebAsssemblyTargetTransformInfo.cpp

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/simd-setcc.ll

lukel97

LGTM!

Just for the PR title, I would say Combine i128 to v16i8 for setcc since it's technically a combine, not legalization.

And make sure to flesh out the PR description with a few sentences about how ExpandMemcmp can expand larger 128 bit loads, but they're emitted as i128s and we need to combine them into v16i8 types for efficient lowering.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/memcmp-expand.ll

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

dschuff · 2025-08-13T00:10:53Z

It looks like this change has caused a test failure on Emscripten's test suite: the first memcmp in the neon test.
Sorry I haven't had a chance to reduce it or investigate. If you compile that file with Emscripten (em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js, or compile the preprocessed source ) you should be able to reproduce it.

badumbatish · 2025-08-13T07:05:23Z

thanks Derek, I'll revert and investigate this

… for 16 byte loads with simd128" (#153360) Reverts #149461 The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the Emscripten test suite has failed. This PR applies a revert so I can take a closer look at it Test case link: https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js` Original comment report: #149461 (comment)

…pand memcmp for 16 byte loads with simd128" (#153360) Reverts llvm/llvm-project#149461 The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the Emscripten test suite has failed. This PR applies a revert so I can take a closer look at it Test case link: https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js` Original comment report: llvm/llvm-project#149461 (comment)

badumbatish · 2025-08-13T23:30:14Z

i tried adding some simple print debugging to this, and found sth a bit weird (or interesting): if i print the memcmp result before the assertion, then i dont get the assertion error.

For example, modify the test loop in the first test to be this and it doesn't crash anymore

        for (size_t i = 0 ; i < (sizeof(test_vec) / sizeof(test_vec[0])) ; i++) {
                int32x4_t a = vld1q_s32(test_vec[i].a);
                int32x4_t b = vld1q_s32(test_vec[i].b);
                int32x4_t r = vaddq_s32(a, b);
                int32_t r_[4];
                vst1q_s32(r_, r);
                printf("At %zu\n", i);
                printf("Byte of r_  : %02X %02X %02X %02X\n", r_[0], r_[1], r_[2], r_[3]);
                printf("Byte of test: %02X %02X %02X %02X\n", test_vec[i].r[0], test_vec[i].r[1], test_vec[i].r[2], test_vec[i].r[3]);
                // comment or uncomment the following line
                printf("Memcmp result: %d\n\n", memcmp(r_, test_vec[i].r, sizeof(int32_t) * 4));
                assert(memcmp(r_, test_vec[i].r, sizeof(int32_t) * 4) == 0);
        }

If i comment out the memcmp result, then i get the following error

Testing NEON Wasm SIMD
At 0
Byte of r_  : 8A64C799 484D47E1 9BBF3942 B38F111F
Byte of test: 8A64C799 484D47E1 9BBF3942 B38F111F
Aborted(Assertion failed: memcmp(r_, test_vec[i].r, sizeof(int32_t) * 4) == 0, at: test/neon/test_neon_wasm_simd.cpp,58,test_simde_vaddq_s32)

Compiler exits successfully (no assertion error) in the case of printing out the memcmp result, with the following log.
Note, the commit used by build.wasm is the last commit in this PR before merging.

 em++ test/neon/test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o temp.js -v  && node temp.js
 "/Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/clang++" -target wasm32-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -D__SSE__=1 -D__ARM_NEON__=1 -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -O2 -msimd128 -v -c test/neon/test_neon_wasm_simd.cpp -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/test_neon_wasm_simd_0.o
clang version 22.0.0git ([email protected]:badumbatish/llvm-project.git 9d2b041c6d4d41d18f89f19f54c6fcef68c5e106)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin
Build config: +assertions
 (in-process)
 "/Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/clang-22" -cc1 -triple wasm32-unknown-emscripten -O2 -emit-obj -disable-free -clear-ast-before-backend -main-file-name test_neon_wasm_simd.cpp -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -target-feature +simd128 -fvisibility=hidden -debugger-tuning=gdb -fdebug-compilation-dir=/Users/jjasmine/Developer/igalia/emscripten -target-linker-version 1167.4.1 -v -fcoverage-compilation-dir=/Users/jjasmine/Developer/igalia/emscripten -resource-dir /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/lib/clang/22 -D EMSCRIPTEN -D __SSE__=1 -D __ARM_NEON__=1 -isysroot /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten/c++/v1 -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1 -internal-isystem /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/lib/clang/22/include -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include -fdeprecated-macro -ferror-limit 19 -fmessage-length=154 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/test_neon_wasm_simd_0.o -x c++ test/neon/test_neon_wasm_simd.cpp
clang -cc1 version 22.0.0git based upon LLVM 22.0.0git default target arm64-apple-darwin24.5.0
ignoring nonexistent directory "/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten/c++/v1"
ignoring nonexistent directory "/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/compat
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/lib/clang/22/include
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/clang --version
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/wasm-ld -o temp.wasm /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/test_neon_wasm_simd_0.o -L/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten -L/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/src/lib -lGL-getprocaddr -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/tmphto8ux8elibemscripten_js_symbols.so --strip-debug --export=_emscripten_stack_alloc --export=__wasm_call_ctors --export=emscripten_stack_get_current --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-table -z stack-size=65536 --no-growable-memory --initial-heap=16777216 --no-entry --table-base=1 --global-base=1024
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/llvm-objcopy temp.wasm temp.wasm '--remove-section=.debug*' --remove-section=producers --remove-section=name
 /Users/jjasmine/Developer/igalia/emsdk/node/22.16.0_64bit/bin/node /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/tools/compiler.mjs -
 /Users/jjasmine/Developer/igalia/emsdk/upstream/bin/wasm-opt --strip-target-features --post-emscripten -O2 --low-memory-unused --zero-filled-memory --pass-arg=directize-initial-contents-immutable temp.wasm -o temp.wasm --mvp-features --enable-bulk-memory --enable-bulk-memory-opt --enable-call-indirect-overlong --enable-multivalue --enable-mutable-globals --enable-nontrapping-float-to-int --enable-reference-types --enable-sign-ext --enable-simd
 /Users/jjasmine/Developer/igalia/emsdk/upstream/bin/wasm-opt --strip-target-features --post-emscripten -O2 --low-memory-unused --zero-filled-memory --pass-arg=directize-initial-contents-immutable temp.wasm -o temp.wasm --mvp-features --enable-bulk-memory --enable-bulk-memory-opt --enable-call-indirect-overlong --enable-multivalue --enable-mutable-globals --enable-nontrapping-float-to-int --enable-reference-types --enable-sign-ext --enable-simd
 /Users/jjasmine/Developer/igalia/emsdk/node/22.16.0_64bit/bin/node /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/tools/acorn-optimizer.mjs /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.js JSDCE --minify-whitespace -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.jso1.js
 /Users/jjasmine/Developer/igalia/emsdk/node/22.16.0_64bit/bin/node /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/tools/acorn-optimizer.mjs /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.js JSDCE --minify-whitespace -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.jso1.js
Testing NEON Wasm SIMD
At 0
Byte of r_  : 8A64C799 484D47E1 9BBF3942 B38F111F
Byte of test: 8A64C799 484D47E1 9BBF3942 B38F111F
Memcmp result: 0

At 1
Byte of r_  : DA07F2F6 7D20533A AC17DF8C 945FA5F0
Byte of test: DA07F2F6 7D20533A AC17DF8C 945FA5F0
Memcmp result: 0

At 2
Byte of r_  : E6FA39AC 1A631C8C EBC479FA 277E2320
Byte of test: E6FA39AC 1A631C8C EBC479FA 277E2320
Memcmp result: 0

At 3
Byte of r_  : 3A5C20AD 3452BE3A 591F1838 187E9D39
Byte of test: 3A5C20AD 3452BE3A 591F1838 187E9D39
Memcmp result: 0

At 4
Byte of r_  : 49985D0F 567EEA1C 3BAD9E00 F9542D3A
Byte of test: 49985D0F 567EEA1C 3BAD9E00 F9542D3A
Memcmp result: 0

At 5
Byte of r_  : AD084A90 36028635 5E70B023 1556C5DC
Byte of test: AD084A90 36028635 5E70B023 1556C5DC
Memcmp result: 0

At 6
Byte of r_  : 570C5B21 48E0DE0 9961FDBD AEAEB8C2
Byte of test: 570C5B21 48E0DE0 9961FDBD AEAEB8C2
Memcmp result: 0

At 7
Byte of r_  : 2BE4B64B 812F71C3 331A906F C0E1C947
Byte of test: 2BE4B64B 812F71C3 331A906F C0E1C947
Memcmp result: 0

Success!

lukel97 · 2025-08-14T05:13:27Z

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

+                                   DL, MVT::i32),
+                   Cmp});
+
+  return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), CC);


We're accidentally negating the result, this should be

Suggested change

return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), CC);

return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), ISD::SETNE);

I should have caught this in review earlier, sorry! You should open up another PR that reverts the revert and include this fix in it

can confirm this doesn't trigger the assertion on the neon test, ty for the keen eyes Luke!

@memcmp

…CC (#153703) This PR reapplies #149461 In the original `combineVectorSizedSetCCEquality`, the result of setcc is being negated by returning setcc with the same cond code, leading to wrong logic. For example, with ```llvm %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16) %res = icmp eq i32 %cmp_16, 0 ``` the original PR producese all_true and then also compares the result equal to 0 (using the same SETEQ in the returning setcc), meaning that semantically, it effectively is calling icmp ne. Instead, the PR should have use SETNE in the returning setcc, this way, all true return 1, then it is compared again ne 0, which is equivalent to icmp eq.

@memcmp

…bine of SETCC (#153703) This PR reapplies llvm/llvm-project#149461 In the original `combineVectorSizedSetCCEquality`, the result of setcc is being negated by returning setcc with the same cond code, leading to wrong logic. For example, with ```llvm %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16) %res = icmp eq i32 %cmp_16, 0 ``` the original PR producese all_true and then also compares the result equal to 0 (using the same SETEQ in the returning setcc), meaning that semantically, it effectively is calling icmp ne. Instead, the PR should have use SETNE in the returning setcc, this way, all true return 1, then it is compared again ne 0, which is equivalent to icmp eq.

badumbatish requested review from dschuff and lukel97 July 18, 2025 06:24

[WebAssembly] Add simd support for memcmp

45ad537

badumbatish force-pushed the wasm_teach_load branch from 8432eea to 45ad537 Compare July 31, 2025 21:22

badumbatish marked this pull request as ready for review July 31, 2025 23:27

llvmbot added the backend:WebAssembly label Jul 31, 2025

badumbatish changed the title ~~[WebAssembly] [Draft] Legalize i128 to v2i64~~ [WebAssembly] Legalize i128 to v2i64 for setcc Aug 1, 2025

lukel97 reviewed Aug 4, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

Addresses PR reviews

1ae947f

lukel97 reviewed Aug 6, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

badumbatish added 2 commits August 7, 2025 11:27

Addresses shortcomings, add more tests

1934453

Remove redundant include

0b7a92f

badumbatish changed the title ~~[WebAssembly] Legalize i128 to v2i64 for setcc~~ [WebAssembly] Legalize i128 to v16i8 for setcc Aug 7, 2025

lukel97 reviewed Aug 8, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/WebAssembly/simd-setcc.ll Outdated Show resolved Hide resolved

Fix shortcomings in setcc ne

a180a57

badumbatish changed the title ~~[WebAssembly] Legalize i128 to v16i8 for setcc~~ [WebAssembly] Legalize i128 to v16i8 for setcc, expand memcmp for 16 byte loads with simd128 Aug 11, 2025

lukel97 approved these changes Aug 11, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/WebAssembly/memcmp-expand.ll Show resolved Hide resolved

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Show resolved Hide resolved

badumbatish changed the title ~~[WebAssembly] Legalize i128 to v16i8 for setcc, expand memcmp for 16 byte loads with simd128~~ [WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 Aug 11, 2025

Addresses nits

9d2b041

dschuff approved these changes Aug 12, 2025

View reviewed changes

badumbatish merged commit 348f01f into llvm:main Aug 12, 2025
9 checks passed

badumbatish mentioned this pull request Aug 13, 2025

Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" #153360

Merged

lukel97 reviewed Aug 14, 2025

View reviewed changes

badumbatish added a commit to badumbatish/llvm-project that referenced this pull request Aug 14, 2025

Reapply llvm#149461

3892b0c

lukel97 mentioned this pull request Aug 15, 2025

[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC #153703

Merged

	return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), CC);
	return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), ISD::SETNE);

[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 #149461

[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 #149461

Conversation

badumbatish commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

badumbatish commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Edit: this is resolved.

Uh oh!

badumbatish commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschuff commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

badumbatish commented Aug 13, 2025

Uh oh!

badumbatish commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

badumbatish Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

badumbatish commented Jul 18, 2025 •

edited

Loading

badumbatish commented Jul 18, 2025 •

edited

Loading

dschuff commented Aug 13, 2025 •

edited

Loading

badumbatish commented Aug 13, 2025 •

edited

Loading